Feature-Rich Discriminative Phrase Rescoring for SMT

نویسندگان

  • Fei Huang
  • Bing Xiang
چکیده

This paper proposes a new approach to phrase rescoring for statistical machine translation (SMT). A set of novel features capturing the translingual equivalence between a source and a target phrase pair are introduced. These features are combined with linear regression model and neural network to predict the quality score of the phrase translation pair. These phrase scores are used to discriminatively rescore the baseline MT system’s phrase library: boost good phrase translations while prune bad ones. This approach not only significantly improves machine translation quality, but also reduces the model size by a considerable margin.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating a Discriminative Classifier into Phrase-based and Hierarchical Decoding

Current state-of-the-art statistical machine translation (SMT) relies on simple feature functionswhichmake independence assumptions at the level of phrases or hierarchical rules. However, it is well-known that discriminative models can benefit from rich features extracted from the source sentence context outside of the applied phrase or hierarchical rule, which is available at decoding time. We...

متن کامل

A Feature-rich Supervised Word Alignment Model for Phrase-based Statistical Machine Translation

Word alignment plays an important role in statistical machine translation (SMT) systems. The output of word alignment can be used to build a phrase table, which is the core model in the decoding of new sentences. Most current SMT systems use GIZA++, a generative model, to automatically align words from sentence-aligned parallel corpora. GIZA++ works well when large sentence-aligned corpora are ...

متن کامل

(Hidden) Conditional Random Fields Using Intermediate Classes for Statistical Machine Translation

One of the major components of Statistical Machine Translation (SMT) are generative translation models. As in other fields, where the transition from generative to discriminative training resulted in higher performance, it seems likely that translation models should be trained in a discriminative way. But due to the nature of SMT with large vocabularies, hidden alignments, reordering, and large...

متن کامل

Lattice rescoring methods for statistical machine translation

Modern statistical machine translation (SMT) systems include multiple interrelated components, statistical models, and processes. Translation is often factored as a cascaded series of modules such that the output of one module serves as the input to the next; this is the SMT pipeline. Simplifying assumptions, limited training data, and pruning during search mean that the hypothesis produced by ...

متن کامل

NTT - NAIST SMT Systems for IWSLT 2013

This paper presents NTT-NAIST SMT systems for EnglishGerman and German-English MT tasks of the IWSLT 2013 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems: forest-to-string, hierarchical phrase-based, phrasebased with pre-ordering. Individual SMT systems include data selection for domain adaptation, rescoring using recurrent ne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010